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FOREWORD 


This  paper  examines  a  class  of  variable  metric  method*  of  minlmiziny 
unconstrained  functions  that  arise  when  the  Sequential  Unconstrained  Minimi¬ 
zation  Technique  (SUMT)  is  applied  to  general  nonlinear  programming  prob¬ 
lems.  The  methods  considered  require  a  knowledge  of  only  the  first  derivatives 
ui  the  function  to  be  minimized  but  proceed  to  estimate  the  inverse  hessian  of 
second  partial  derivatives  during  the  course  of  a  series  of  one-dimensional 
minimizations. 

Three  new  algorithms  and  the  Fletcher- Powell- Davidon  algorithm  are 
derived  using  simple  properties  of  a  general  solution  to  the  problem  of  esti¬ 
mating  the  inverse  hessian.  Results  of  numerical  calculations  for  several 
examples  show  the  relative  merits  of  the  new  algorithms  compared  to  several 
in  current  use. 


Nichole*  M.  Smith 
Head,  Advanced  Reaearch  Department 
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ABSTRACT 


Two  basic  approaches  to  <he  generation  of  conjugate  di¬ 
rections  ate  considered  for  the  problem  of  unconstrained  minimi¬ 
zation  oi  quadratic  functions.  The  first  approach  results  in  a 
projected  gradient  algorithm  that  gives  “n  step”  convergence  for  a 
quadratic.  The  second  approach  is  based  on  the  generalized  solution 
of  a  set  of  undetermined  linear  equations,  various  forms  of  which 
generate  various  new  algorithms  all  giving  n -step  convergence. 
One  of  them  is  the  Fletcher  and  Powell  modification  of  Davldon’s 
method  - 

Results  of  an  extensive  numerical  comparison  of  these  meth¬ 
ods  with  the  Newton- Raphson  method  and  Fletcher- Reeves  method 
are  included. 
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1.  INTRODUCTION 
General 

Let  A  be  an  inn  positive  definite  symmetric  matrix;  let  b  be  an  arbitrary 
n  vector  and  c  an  arbitrary  constant. 

Consider  the  problem  of  finding  the  minimizing  n  vector  x  *  x*,  for  a 
quadratic  function  f(x)  defined  by 

f(i)  »  ‘jx'Ai  »  ii  i  i  (l) 

The  methods  considered  here,  called  variously  “variable  metric,"1  “quasi- 
Newton,"1’3  or  “large-step  gradient  methods, consist  of  selecting  an  h  x  n 
matrix  H,  at  stage  i  and  forming  the  direction  d,  =  H!g,  where  a,  is  the  gradient 
of  f(x)  at  **.  A  step  of  length  a,  is  chosen  so  that  x,  +  o,d,  is  the  minimum 
of  f{x,  +  a,d, ),  i.e.,  where  djaul  =0.  H,  is  then  updated  using  (*|tj  -  x,)and 
(gj,i  ~  8j).  If  Hj  ■  I  this  is  the  method  of  steepest  descent.  The  Newton- 
Raphson  method  Is  obtained  with  H,  =  A'1.  wever,  on  a  nonquadratic  func¬ 
tion  A  or  its  equivalent  the  hessian  of  fU  i  u  ,y  not  be  available.  It  is  then  of 
general  interest  to  examine  methods  that  utuizc  only  first-derivative  informa¬ 
tion  and  that  in  addition  may  estimate  A. 

Since,  as  is  reviewed  in  Sec  2,  a  quadratic  can  be  minimized  in  it  steps 
if  dQ,  dj ,  d„_ j  are  conjugate  directions,  this  paper  studies  a  class  of  H(  matrices 
that  will  generate  conjugal  directions.  In  Sec  3,  H,  is  chosen  as  a  projection 
matrix  and,  in  Sec  4,  H,  is  chosen  as  a  solution  to  the  equation  H ,y,  ■  Sr  The 
Fletcher- Powell- Davidon 5  algorithm  is  sho  t  to  be  a  member  of  this  latter 
class.  A  numerical  comparison  of  several  n  v  algorithms  with  the  Fletcher- 
Powell- Davidon  and  the  Fletcher- Reeves  algorithm  is  given  in  Sec  D. 
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Notation 


At  iteration  t  the  following  column  vectors  occur: 

x,  is  the  current  solution. 

8l  is  the  gradient  of  f(x)at 

If,  is  the  current  direction  matrix  or  metric, 
d,  ia  the  search  direction  from  x, ,  d‘dt  “  1. 
s,  *  *|,i  -  *,  -  atdt  is  the  step  in  x,. 

y,  *■  Si+ 1  '  3,  r  As,  is  the  step  in  8t. 

a,  is  the  step  length,  a  negative  scalar, 
gj  denotes  8,  transpose,  a  row  vector. 

S,  -  t*0,  Sj,  .  .  .  ,  s,_!  ]  denotes  a  matrix  with  columns  s0,  .  .  .  ,  s,_ 
and  also  without  ambiguity  [s0,  s( ,  .  .  .  ,  s,_i  1  denotes  the  sub¬ 
space  spanned  by  vectors  s0,  s, ,  .  .  .  ,  $, . 

Y,  “  ty0.  y\ . y,_ j  ]  denotes  an  mxi  matrix  with  columns  y , , 

2.  PROPERTIES  OF  CONJUGATE  DIRECTIONS 

It  is  convenient  to  isolate  the  properties  of  conjugacy  from  the  problem 
of  generating  conjugate  directions  as  discussed  in  later  sections. 

Definition 

A  set  of  m  independent  directs  ns  dy,  d,,  .  .  .  ,  dH..j  are  conjugate  with 
respect  to  a  positive-definite  symmetric  matrix  A  iffl 

djAd,  0  0  M  /  j  <  M  1 

2,'id,  -0  0  •  l  <  n  1  ^ 

Any  point  x  €  EH  can  be  represented  in  terms  of  d0,  .  .  .  ,  dn_t  as  follows: 
Let 

»  i 

1  -  A,d, 

»  0  11 

then 

A,  I  'Ad,  Jj  Vd, 


Similarly  the  quadratic  f(x)  =  '/}x  'Ax  +  x'b  +  c  can  be  decomposed  into  m  in¬ 
dependent  terms. 


(3) 


Hi!  *'*')*  *0oAl<‘)’ 

’  ,i  ('«}<*<* a, b-a,).t 


Thus  any  quadratic  can  be  minimized  in  n  steps  by  minimizing  the  n  terms 
independently. 

Define  n  x  i  matrices  V,  and  S, 

V.  ‘  !>0'  >i . h -l' 

s.  iJo-  *i . si  i' 

Since  s,  *  x|4,  -  x, ,  y,  *  g,,,  -  g,  and  g,  =  Ax,  +  b  then, 

>i  -  *s, 

and 

V('  Sj  t  S(‘>j  »0  1  <  i  <  j  <  »  -  1 

when  the  steps  s0 ,  Sj  ,  .  .  .  are  conjugate. 

Now  consider  two  simple  results  that  hold  for  independent  direction  i  d, . 
Lemma  1.  The  point  x,  =  x0  +  is  the  minimum  of  f(x)  over  the 

subspace  [d0 ,  d,  ,  .  .  .  ,  d,.  t  ]  if  and  only  if  S'g,  »  0. 

Proof:  If  f( Jt, )  is  a  minimum  in  direction  dj  then 

i  djfl,  -  0  fo»  0  <  |  <  I  -  1 ,  :.t.,  S’fl,  *  0 

Since  f(x)  is  strictly  convex  let  x,  =  x,  +  Sj*c  •  j df  ;  then  fit, )  a  fix, )  + 

g,'(x,  -  x,),  equality  occurring  only  when  x,  =  x,. 

If  s/g,  =  0,  i.eM  d' gj  =  0  for  0  s  ]  <  *  —  1;  then  t ,  *  0  implies  fix, )  > 
and  fix, }  is  the  minimum. 

Lemma  2,  Suppose  at  x, ,  S/g,  =  0;  if  s,  satisfies  V/s,  =  0  and  s/flol  =0> 
then  S,tlg)+i  *0, 

Proof:  On  a  quadratic  function  Y ‘s,  =  S/y,  =  S/i gN2  -  a,+ 1 )  =  Vfli*2  -  0.  If 
in  addition  s/g, ,2  -  0,  then  by  definition  of  S|(l,  =  0, 

Lemma  1  provides  a  simple  characterization  of  the  progress  at  stage  i  , 
and  lemma  2  indicates  that  stepping  to  a  minimum  in  a  direction  orthogonal  to 
the  previous  gradient  changes  locates  the  minimum  over  a  larger  subspace. 
Note  that  in  neither  case  was  conjugacy  of  the  d,  required,  only  independence. 
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3.  THE  PROJECTED  GRADIENT  ALGORITHM 


As  Eq  4  shows,  one  way  of  generating  conjugate  directions  is  to  make 
successive  steps  orthogonal  to  previous  gradient  changes.  It  is  remarkable 
that  this  can  be  done  on  arbitrary  functions  fix)  and  produces  a  weaker  form 
of  conjugacy  discussed  elsewhere.7  A  result  similar  to  Theorem  1  has  been 
given  independently  by  Goldfarb,* 

Theorem  1 

Let  ft  be  a  symmetric  positive  definite  matrix  and  define  $  f ,  y,  by  the 
recursion 

i  -  o  ll0  *  I 

i  -  o  h(  t  i  -  RV'tv.RV,)  ‘y; 

-  win  l fix,  *  a(H,Rg,)l 

S1  +  i  -  [S,,  ij  +  i  -  *,1 
V,  +i  *  tV,,  gltl  -  9,1 

then  either  for  some  j  <  n 

-  0.  II,  0  «fld  fl,  -  0,  Ij  «  x* 
or  if  the  recursion  continues  to  j  *>  n 

H#  .  0  «„  *  0,  -  i* 

Proof:  If  go  =  0,  then  H0  =  I  and  i0  =  x*.  If  go  i  0,  then  do  *  H0R^,  i  0.  Equa¬ 
tion  6  requires  d0'g1  =  0,  i.e.f  «  -d^  g^tlyAd^  <  0,  and  consequently  both  s0 
and  y0  t  0.  Thus  Yj  =  [y0]  and  aj  =  [s0]  both  here  rank  1,  and  S/gj  »  s0'gj  =0. 

Proceeding  by  induction,  suppose  Y,  and  S,  have  rank  i  and  S,' g,  =  0.  Then 
H,  exists  and  is  a  projection  matrix  with  properties  H?  =  H,,  H 'Y,  ■  0.  H,RY,  =  0. 

Thus  from  Eq  S  each  new  direction  is  orthogonal  to  the  columns  of  Y,. 

Now  d,  •  H,Rg,  =  0  if  and  only  if  g,  =  0,  for  if  g<  j  0,  then  by  Eq  5  g(  6 
[y0,  >i»  •  •  •  »  Yi-i  3 .  he.,  for  some  u>,  g,  =  Y,m.  However,  S/g,  =  0  implies 
SfY,u<  =  S'ASjir  *  0  for  w  i  0,  which  contradicts  the  definiteness  of  A.  Thus 
H,Rg,  *  0  implies  g,  =  0.  • 

Suppose  HjRg,  i  0;  then  Eq  6  requires  d,'g|(1  =  0,  i.e.,  a,  =  -g|RH,'g,/ 
gfRH'Ag,  <  0  since  gt  is  not  a  linear  combination  of  y0,  y, ,  .  ,  .  ,  y,^.  Thus  s, 


(5) 

(6) 

(?) 

(8) 
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and  y,  are  nonzero.  The  direction  choice  implies  Y's,  *  0  and  as  a  result 
Y,  t  j  ,  Sul  have  rank  I  +  1.  Otherwise  for  some  w  i  0,  y(  =  Y,ui  and  S/y,  *  S,' 
AS(ui  ■  0  as  before.  Similarly  if  s,  *  s,u-  t  0,  Y's,  *  S,'AS,ti'  *  0  implies  u<  -  0. 

The  recursion  terminates  for  some  j  <  n  when  HjRgj  “  0,  which  requires 
=  0  and  =  i*. 

If  the  recursion  continues  to  i  =  h,  as  is  likely,  then,  since  by  induction 
Yh,  Sh  have  rank  n  and  =  0,  it  follows  that  ^*0,  *„  *  x*,  and  H„  ■>  0. 

A  convenient  algorithm  can  be  found  by  application  of  the  bordered  in¬ 
verse  lemma  in  App  A,  to 

h,r  =  r  -  rv,iv;rv,v  1y;r 


Algorithm  1 

M..,  -  Hr  <H(y,)<H,y(> 
Hq  -  « 

Then  Eq  6  is  replaced  by 

Wl 


(9) 


(10) 


Corollary  1,1,  If  Y,'s,  *  0  for  i  »  1 ,  2,  ,  .  .  ,  j ,  then  s0 ,  5j . are 

conjugate. 

Proof:  Y,'s,  <=  0  implies  s^As,  =  s/Asj,  =  0  for  k  <  i,i.e.,  s' \sk  =0 
0  s  t  i  k  s  j. 


R  allows  a  choice  other  than  I  for  the  initial  H0 ,  a  property  that  apparently 
minimizes  round-off  errors.1  However,  R  can  also  be  used  to  take  advantage 
of  any  partial  inverse  of  A. 

Suppose  A  has  a  partitioned  form 


A 


a22) 


Assume  A{  j  is  a  known  r  x  r  block  and  set 


Inserting  this  in  Eq  10,  a  simple  calculation  shows  that 

^  *  -  So  Hofk)  v  -  * 
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and  that 


(12) 


h 


ft;)  *(*;'  3(t)-L-.,\u 


Thus  the  top  r  components  of  g,  are  zero  after  this  "partial  inverse"  step. 

Let  50,  S| ,  ,  .  ,  ,  s,  |  be  f  unit  vectors  of  the  form  s{  ■  (0,0, ...  ,0, 1,0, ...  ,0) 
with  1  in  the  i+lth  position. 

Let  V,  *  A,sr  “  (a,',  ,  An, ) then  clearly  S,  and  V,  have  rank  r,  and  If 
e  gt  then  5,'g,  <*  0,  Hut  these  are  the  inductive  hypotheses  in  Theorem  1  at 
the  rth  stage,  which  yields, 

Corollary  1.2,  After  the  partial  inverse  Step  (Eqs  It)  and  il)  the  projected 
gradient  algorithm  is  at  stage  r  with  H,  defined  by 


i  - 


(4:)[(4:)(4:)]  '(4:) 


(13) 


It  will  terminate  in  not  more  than  h  -  r  further  steps. 

A  more  transparent  explanation  of  the  restart  is  to  note  that  if  Audtl  + 

A,2  d,  2  "0  where  d‘  *  (  d,',  ,  d'2 ),  then  the  top  r  components  of  g,  are  unchanged 
from  0.  This  is  equivalent  to  Eq  13.  (See  App  C  for  an  example.) 


i.  VARIABLE  METRIC  ALGORITHMS 

The  class  of  algorithms  in  this  section  are  based  on  the  following  idea. 

II  lij  satisfies  H,Y,  -  S,  and  steps  s0,  s, . s(  ,  were  obtained  by  minimiz¬ 

ing  down  independent  directions,  i.e.,  S,'g,  »  0,  then  the  direction  d,  =  H/g, 
Hind  step  s,  —  Ojd,  are  conjugate  to  s0,  s, ,  .  ,  .  ,  s,_[  ,  i.e.,  Y/s,  =  Y'H'g,  = 
atS/g,  ■  0.  Clearly  if  the  process  continues  to  stage  m  ,  HK  -  A-1,  and  all  the 
steps  are  conjugate.  Since  gH  Is  orthogonal  to  the  previous  n  steps  it  must 
be  zero. 

Now  consider  the  general  solution  to  the  equations  H,Y,  =  St.  This  has 
the  form  for  arbitrary  Z  of 

ii,  .  s,y;  *  :<i  -  v,y;>  (14) 

where  Y*  is  the  generalized  inverse  of  Y,  ,8  If  Y,  is  of  rank  i ,  Uien  Y'  ■ 

(  Y/Y,)"1  Y,'  and  has  the  property  that  Y,  Y*  is  a  projection  of  t”  onto  ty0, 

,],  In  addition  i*  =  Y,’ 6  minimizes  (Y,x  -  b  )'( Y,i  -  b  ). 
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Suppose  that  V,  has  rank  i  ;  then  It  will  be  convenient  to  deline  Yf  * 
t  Y'NYt )  v/H  where  h  i'i  a  positive  definite  symmetric  matrix.  Note  that 

«,  .  s,v?  •  »<»  -  v(v;> 

aieo  satisfies  H, V,  -  S(l  that  V, Y*  -  Y,<  Y(1f,  1’,)  V'^lM  also  is  a  projection  matrix, 
and  that  r*  *>  Y,*b  minimises  (Y,**-  b) '  H<Vfx  ~6)* 

Theorem  2,  General  Variable  Metric  Algorithm 

Let  R  and  H  be  positive  del  mite  symmetric  matrices  and  define  the 


algorithm  as  follows: 

for  i  «  0 

ll0  -  H 

(15) 

for  i  >  0 

H,  -  S,V*  fWI  -V.Vp 

(1«) 

where 

Y*  «od  V**  k*vt  lie  form  (V/lIV,)*1  Y ’ll 

(17) 

with  H  «  R  or  A-1  independently  for  each  term. 

,*i>  -  «><>  f<‘,  ♦a1n,‘s1) 

“t 

^  M 1  ”  ^  i :  ^i4i  ~  f0 

(18) 

(19) 

with 

Y|  -  tsi  *lli  -‘v! 

(20) 

Then  the  algorithm  terminates  for  some  i  ?  »  when  H'#t  =  0  implies  g,  *  0, 
i,  ai*.  Hi  *  n  tlienH,,  *■  A“l 

Proof:  If  *0  /  i*  then  gg  /  0.  Using  Eqs  15,  18,  and  20  the  first  step  results 
in  Yk  and  S,  having  rank  1  and  Sj'hj  ■  0. 

Proceeding  by  Induction,  suppose  that  at  stage  i  <  »,  /  0,  Y(  and  S(  have 

rank  i  and  Sfl,  m  0.  It  will  be  shown  that  this  is  true  for  i  + 1. 

Computing  the  direction  of  search,  H'a,  *  Y*'s's,  *  (I  -  Y**'Y')  R#(  =0  + 

0  ~  HY(  (Y'HY,)"1  Y,  ]  Rfl,  »  0  if  and  only  if  H-'R^t  [y0>  yt . y,j].  But  for 

H  “R  or  H  «  A"*,  *  0  would  require  «,  *  0.  Thus  H  'g,  t  0  if  g,  i  0. 
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Minimizing  at  stage  »,  a,  -  -  at' H /s/H ,  All i  o  if  and  only  it  r  d, 

If  H  *A-‘  then  «[«>,  -  gfa,  >0.  if  H  -  R,  *  gJH.R^H.'g,  >  0.  Thus  if  g,  t  0, 
a,  i  0.  By  constriction,  V ('(itll  -  «,)  ■  Y's,  -  0.  this  fact,  Eq  18,  and  lemma  2 
provide  that  sul'g)+1**0. 

Since  af  +  0,  *(il  - »(  and  gul  -g,  are  nonzero  and  Sj4, ,  Y(tl  in  Eq  itl  will 
have  rank  f  *  1.  Otherwise  y,  e  [y0,  Vi ,  .  .  ,  ,  y,.jj  and  since  ir'Js,  -  S'y,  “0, 
y,  -  0  and  similarly  s,  *  0,  a  contradiction. 

Thus  the  iteration  can  only  terminate  if  »,  =  0  for  which  i,  =  i*;  otherwise 
it  proceeds  until  i  -  n.  Here,  however,  has  rank  h  and  S'g„  -0  implies  o 
and  xH  =**.  =  A-*  by  construction  since  YwandSs  have  rank  u. 

Since  H  can  be  chosen  to  be  A"*  or  R  in  Eqs  16  and  17  there  are  four  pos¬ 
sible  algorithms  in  this  scheme, 

The  next  corollary  allows  for  the  fact  that  if  a  restart  Is  used  the  Initial 
directions  [s0,  $i(  .  .  .  ,  sj  are  not  necessarily  conjugate.  See  Corollaries 
l.i  and  2.1.  However  under  normal  operations  this  is  the  case. 

Corollary  2.1.  If  Y ,'s,  *  0  for  i  -  1,  2,  .  .  .  ,  j  <  «,  then  sUf  sl(  .  ,  ,  ,  sy  ; 
are  conjugate. 

Particular  algorithms  can  be  obtained  by  choosing  H  differently  for  each 
Y  *  in  Eq  16. 


Algorithm  2 
Choose 

h,  s,(s;v,)  ’s;  ♦  mi  y/s.v,)1  s;i  (21) 

This  corresponds  to  H  =  A  1  in  Eq  17.  Expanding  this  formula  using  the 
bordered  inverse  lemma  in  App  A, 

It.  -  S.  AV.'tr/y.'U  -  AY.'lf. 

(22) 


-  H,  ♦  (l,  -  H^Vs,  -  S,  AV('t('7y,'U  - 1,  ay;i  l, 

H0  -  R 


where 


A  s;y, 

When  used  In  Theorem  2,  the  projection  properties  of  H,  require  that  Y,'*,  *  0 
and  consequently 

H.  ,  *  11.  «  II.  -  H.V.Hf  J  /l.  V. 

(23) 


H..1  -  M, .  ts,  -  Hly(Ms,)>/sl*yl 


H0  *  R 
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This  particular  algorithm  is  due  independently  to  0,  P.  McCormick,  Note 
that  1A  general  H,  it*  unayimrietric, 


Choose 


H,  -  S.tV/HV, )  1  l.’H  *  HU  *  (V  MV  *  1  V  «! 

i  1  *  1  •  X  >  L  i 


This  corresponds  to  H  -  R  throughout  Theorem  2,  Expanding  this  formula  and 

using  the  projection  properties  of  H,  in  the  form  V,'st  -  0, 

II I. i  “  II,  *  H.v/mi; »,)  />, H,y, 

u  <a*> 

My  -  K 

Again  H,  will  be  unsymmetric  and  In  particular  here  H,'y,  *  H,y, , 

Algorithm  4 
Choose 

h,  -  s,(s;v,rls; .  an  v,tv;#*Vjrl v;m  (26) 

This  corresponds  to  U  -  A  1  in  the  first  Y ,*  and  H  -  R  in  the  second  V,*  of 

Eq  16.  Expanding  H,  and  using  the  projection  properties  of  H,  in  the  form 

yi'*t  ■  5/y,  *o. 

Hlt|  -  II,  (Up, VHly,>7y,,Hi>|  . 

H„ . «  <2” 

This  is  immediately  recognized  as  Fletcher  and  Powell’s  modification  of 

Davidon’o  algorithm.1  H,  is  symmetric  and  the  search  direction  H/g,  »  H,a, 
as  commonly  used. 

As  for  the  projection  matrix,  let  s, ,  i  *  0,1 . r  -  1  be  the  first  r  unit 

vectors  and  define  V,  3  A5(.  Then  if  g,  -  g,  given  by  Eq  12,  S,'g,  =  0  and  both 
S, ,  V,  have  rank  r.  A  simple  calculation  shows  that  for  the  Fletcher- Powell 
algorithm 

*,v*  ■  s,<s;v,}  >s;  -  j)  (28) 

Corollary  2,2,  After  a  partial  Inverse  step  (Eqs  18  and  11)  the  Flctcher- 
Powell-Oavldon  algorithm  can  be  restarted  from  the  rth  stage  with 

“■  (V  “)•!'  C;:)[C;;)'(i;:)]  ,C;;)1  m 


u 


It  will  terminate  In  not  more  than  h  -  t  further  steps.  An  example  la  given 
In  App  C\ 

In  principle  the  general  variable  metric  method  can  be  considered.  All 
that  la  required  is  that  independent  directions  s0,  .  .  .  ,  *(  }  ,  >u ,  y, ,  .  .  .  , 
be  found  such  that  for  some  initial  estimate  fl«  *  I),  SfYf*  *  R.  and  ii/g.  *  0. 

A  fifth  algorithm  can  be  derived  analogous  to  Algorithm  4  by  inserting 
ft  “  A  In  the  first  Y  *  of  Eq  Id  and  A  1  in  the  second.  Unfortunately  It  does  not 
lead  to  a  readily  computable  formula  as  do  the  others. 

6.  NUMERICAL  RESULTS 

Results  of  testing  these  algorithms  on  nonquadratic  functions  will  now  be 
given.  The  numerical  procedure  for  the  seven  schemes  considered  Is  as  follows. 

Given  f(r),  g(t),  and  possibly  A(x) ,  the  matrix  of  second  partial  deriva¬ 
tives  of  f(i)  evaluated  at  > ,  and  starting  at  t0  with  flu  =  k,  the  Initial  matrix, 


and  normalizing  <i( , 

(a)  Find  the  first  local  minimum  of  f(x,  +  a,d,) 

f(i„l>  ■  *><•  fU,  • 
a( 

HiS, 

(b)  Update  ft,  according  to  the  algorithm  used. 

Algorithm  1,  Projected  Gradient  Method  (P-G) 

(30) 

(31) 

«, .i  ' 

if,  -  fH.y.RH.v.'Vf v.H.v.t 

(32) 

for  »  *  «,  and  every  n  steps  H,  ■ 

1  R. 

Algorithm  2 

*i i  •  *  ‘ i  nt>»'  *■.'  >.*, 

(33) 

Algorithm  3 

(34) 
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05) 


Algorithm  4,  Fletcher- Powell- Davldon  (F-P-D) 

Algorithm  5,  Newton- Kaphson  (N-R) 

nf  ■  Uii.it  1  (38) 

The  program  uses  a  modified  Newton- Raphson  step  when  It  appears  that 
A(i|)  has  negative  eigenvalues  as  identified  during  the  process  of  Inversion 
of  A(i,)  using  the  Crout  procedure.  In  this  case  the  direction  of  move  la  along 
an  eigenvector  corresponding  to  a  nr  jative  eigenvalue.  By  this  means  a  region 
is  located  where  the  function  is  convex.10 

Algorithm  6,  Fletcher- Reeves  (F-R) 

*t(3  •  -  #0 

J,.i  -  <»..i  *  d.u;.l«,.i  m,1 

for  1  =  h  +  1,  and  every  h  +  1  steps  d,  *  -g,. 

Algorithm  7,  Projected  New  ton- Raphson  (P-N-R) 

H.H  “  M.-tH.y.KII,),!1 

R,.l  *  K,  .  (t,  Hly(Mll1>()V<);il,>() 

for  i  =  n,  and  every  n  steps  Ht  =  R,. 

The  last  method  investigates  the  effect  of  solving  R,  V,  *  $,  exactly  using 
the  schemes  of  Sec  4  in  the  absence  of  quadratlctty.  H,y,  provides  the  projec¬ 
tion  of  y,  orthogonal  to  [y0,  y, . y, 4  3.  Every  «  steps  R,  is  an  approxima¬ 

tion  to  A(r,)  1  and  a  Newton- Raphson  move  is  made. 

The  reset  form  of  the  algorithm  is  obtained  by  resetting  t  for  Algo¬ 
rithms  2,  3,  4,  and  7  to  R  and  restarting,  Algorithm  1  must  be  reset  every  n 
steps  and,  in  Algorithm  6,  <1„.,  is  reset  to-g,,.,  always. 

The  linear  minimlaation  is  performed  by  a  Fibonacci  search.  Cubic 
Interpolation  works  well  on  low-order  polynomial  functions  but  does  not  prove 
adequate  for  he  logarithmic  penalty  functions  used  in  the  Sequential  Uncon- 


(37) 


(38) 

(39) 
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strained  Minimization  Technique  (SUMT)  for  which  Algorithms  1  to  1 ,  plus 
several  others,  make  up  an  experimental  XMOVE  subroutine. 11  ,u 

Five  problems  were  considered.  The  data  fur  these,  and  other  informa¬ 
tion,  are  found  in  App  B.  The  numerical  results  are  of  course  strictly  com¬ 
parative  tor  each  problem.  In  each  case  the  fastest  algorithm  is  indicated  by 
encircled  and  italicized  iteration  numbers. 

Table  1  gives  results  for  Rosenbrock’s  banana-shaped  valley.19 

fix)  -  lQOtij  *  x2)2  *  (1  x,)s  (40) 


Starting  point  (x,  ,t2)  ■  (-1.2,  1.0);  the  numbers  quoted  are  iterations  until 
f(x*}  <  1013. 


TABLE  t 

Numerical  Results  of  Problem  1 


Algorithm  j 

J 

Mod* 

Normal 

Lir 

i,  p-c; 

42 

2 

18 

31 

3 

21 

37 

4,  K-P-D 

19 

35 

3.  N-B 

© 

6.  i-  a 

— 

© 

T,  P  A  R 

36 

¥ 

TABLE  2 


Numerical  Results  n<  Problem  2 


Algor  itHm 

Normal 

Rossi 

\  p  -y 

65 

i: 

36 

47 

3 

46 

47 

4,  K.P.U 

40 

49 

S,  N-R 

— 

6,  K-R 

— 

7,  P-N.H 

58 

Table  2  gives  results  for  a  test  function  credited  to  C.  F.  Wood  of  Westing- 
house  Research  Laboratory: 


fix) 


100(1,  -  xf>2  (1  -  x,)2  ♦  90(x4  -  x2)2  *  (1  -  x3)2 

-  lO.Mij  -  I)2  •  (x4  l)2  .  I9.8(x2  -  0(i4  -  1) 


(41) 


This  is  designed  to  have  a  nonoptimal  stationary  point  that  can  cause  prema¬ 
ture  convergence.  Initial  point  ( x, ,  x2,  x3 ,  x4)  =  (-3, -1, -3, -1),  and  the  num¬ 
ber  of  iterations  is  for  f(x‘)  ■  013. 

Table  3  shows  the  results  for  a  test  proolem  formulated  by  the  Shell  De¬ 
velopment  Company , 
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subject  to 


V°'I-  1.2 . 5 

I  a,(*(  i  bt,  i  .  1,2..  ...10 

This  is  a  linearly  constrain. ^d  problem,  which  for  particular  choices  of  ej ,  cti , 
4:  has  a  convex  objective.1*  For  ihia  problem,  SUMT  replaces  f(x)  by  f(x)  -  r 
Epilog*  Sj  (t)  for  a  parameter  r  >  0,  where  g,(x)  *  0  represents  the  ith  in¬ 
equality  constraint.  I!  **{t)  Is  the  solution  of  the  modified  problem,  then 
x  *(r)  -  i*  as  r  **  0  where  i*  is  the  solution  to  Eq  42. 


TABLE  3 

Nomoileal  Rofcult*  of  Prohlom  3 


Algorithm 

r  * 

1.0 

»  -  1.56 

X  10“* 

r  -  2.44 

x  icr4 

Normal 

Rout 

Normal 

Rout 

A 

Normal  j 

Rout 

1,  P-C 

26 

— 

ss 

_ . 

70 

2 

27 

22 

44 

41 

62 

60 

3 

33 

22 

50 

40 

67 

© 

4,  F-P-Q 

27 

22 

46 

40 

60 

56 

fi,  N-B 

© 

© 

0 

— 

6,  F.'l 

34 

>165 

Foil 

7,  P-N-R 

31 

so 

0 

67 

54 

Table  4  gives  results  for  the  dual  to  the  previous  problem,  here  the 
dual  problem  has  a  cubic  objective  and  quadratic  constraints.10 

TABLE  4 

Numerical  Rotulti  of  Probltm  4 


Algorithm 

r  = 

1.0 

r  »  0.25 

r  .  0.0625 

Normal 

P«Mi 

Normal 

Rotot 

Normal 

Rout 

t,  P-G 

_ 

120 

-m- 

1M 

_ 

211 

2 

134 

98 

I9S 

221 

lfl7 

3 

136 

10',* 

220 

i31 

246 

4.  F-P-f) 

406 

<§> 

473 

133 

500 

iBv 

5.  \.R 

0 

© 

— 

© 

— 

6.  F-R 

— 

>489 

— 

Foil 

— 

Foil 

7.  P-N-R 

166 

11.3 

198 

150 

230 

186 

Finally,  Tabic  5  shows  the  results  for  an  intriguing  problem  of  maximiz¬ 
ing  the  area  uf  a  hexagon  subject  to  the  constraint  that  its  maximum  diameter 
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is  1.  It  Is  Interesting  to  note  that  the  solution  is  not  a  regular  hexagon.11  The 
particular  formulation  used  had  9  variables  and  13  inequality  constraints,  al¬ 
though  there  is  a  certain  amount  of  redundancy. 

TABLE  5 


Num«ricj|  Knuitj  of  Problem  5 


r  t. 

1.0 

r-  10  2 

r  =  nr4 

r  -  itr4 

Nof  mol 

Normal 

1 

R«  >•(  j 

Normal 

R»mi 

Normal 

Ratal 

I,  P.G 

_ 

13 

_ 

55 

__ 

194 

_ 

278 

2 

17 

70 

34 

42 

308 

79 

326 

97 

3 

© 

© 

31 

© 

73 

© 

96 

0 

4,  K-P-D 

17 

18 

60 

40 

206 

215 

80 

5.  N-R 

18 

— 

© 

— 

© 

— 

© 

— 

6.  1  -R 

— 

13 

— 

55 

— 

194 

— 

278 

7,  P-N-R 

19 

13 

51 

58 

92 

91 

120 

101 

Summary  of  Results 

Tables  1  to  5  illustrate  primarily  the  difficulty  of  selecting  a  meaningful 
test  problem.  The  first  two  problems  are  smooth  polynomials  albflt  with  odd¬ 
shaped  valleys.  The  next  three  problems  are  basically  quadratics  or  cublcs 
with  infinite  barriers  of  the  penalty  functions  against  which  the  solutions  lie. 
This  means  that  their  hessian  matrices  A(x, )  become  very  ill-conditioned,  for 
the  binding  constraints  correspond  to  large  eigenvalues,  which  tend  to  infinity 
for  fixed  /  as  the  solution  *.(r)  approaches  the  constraint. 

If  second  derivatives  are  available,  the  Newton- Raphson  method  is  clearly 
the  best  for  all  five  problems. 

It  seems  that  on  smooth  polynomials  the  variable  metric  methods  are 
best  not  reset,  while  on  penalty  functions  they  are  best  reset.  Under  these 
conditions,  the  Fletcher- Powell- Davidon  algorithm  is  better  for  the  former 
class  of  problems  and  Algorithm  3  is  better  for  the  latter  class,  the  penalty 
functions.  It  is  remarkable  that  in  Table  4  Algorithms  ?,  3,  and  in  particular 
4  were  extremely  slow  when  not  reset. 

Finally  Algorithm  7,  the  projected  Newton- Raphson,  is  better  than  the 
projected  gradient,  showing  that  second-order  information  helps.  However, 


16 


the  separate  calculation  to  obtain  R„  exactly  (Eq  39)  does  not  seem  to  merit  the 
effort  compared  with  the  other  schemes. 

6.  CONCLUSIONS 

This  paper  has  unified  a  series  of  algorithms  in  a  single  framework. 
Basically,  this  ie  that  variable  metric  schemes  depend  on  the  generalized  solu¬ 
tion  to  a  set  of  linear  equations,  and  their  associated  projection  properties  give 
rise  to  conjugate  directions.  A  result  of  this  general  approach  has  been  three 
new  algorithms  whose  comparative  numerical  properties  are  promising.  Ex¬ 
tensions  to  this  work  will  be  found  elsewhere/ 
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Problem  1. 

Problem  2. 


Problem  3. 


subject  to  ij  a  0 


The  data  for 
Problem  4. 
Maximize 

subject  to 


Appendix  3 

TEST  PROBLEM  DATA 


Rosenbrock'e  Banana- Shaped  Valley1' 

/ill  -  1WXi2  -  tj)2  •  (!  -  J,>2 

Wood's  Function 

fit)  »  lOOUj  -  i j)  •  (1  -  t,!2  »  90ii+  -  Ij)2  *  l]  -  Jtj)2 
.  lO.lUj- !>J  -  U4  1)s*  19.SUj-  1)(i4-  1) 


Shell  Primal  Problem 


& 

A  ' 


/  ^ 

i  1  c. 

I  1  1*1  1 


I V? 


j  51  1,  2,  ....  5 


!§*■'■!  5  . “ 


<r, ,  Cjj,  dj ,  a(J ,  and  &,•  are  given  In  Table  Bl, 
Shell  Dual  Problem 


I  10  i  5  l-i 
fii)  s  b  >  -  i  i  £.,*.1. 

pi  i  i  ,-i  j-i  •» 1  * 


>  * 

•2  S  d  Is 
.  I  11 


10  1 

X  a  v  <  t  *2  1  f  x  *  3d  i2.  j  1  2  . .  5 
,  j  1M  -  »  *  ,  *  H  J  JH<V  1  . 

*i  2 0  .  -  1.2 .  S 

y ,  >  0  i  1.2 . 10 


21 


TABLE  B! 


Dole  far  Probltmi  3  and  4 

t«-  $) 


A 


1  2  3  4  5 


\ 

-15 

-27 

-36 

-IB 

12 

1 

30 

20 

-10 

32 

-10 

<rll 

2 

-20 

30 

-  6 

-31 

32 

s 

-10 

-  ft 

10 

6 

-10 

4 

32 

-31 

-  6 

30 

-20 

& 

-10 

32 

-10 

-20 

30 

Otk«r  frV 

<1 

4 

S 

10 

0 

2 

fc,‘ 

1 

-16 

2 

0 

1 

0 

-40 

-40 

-40 

2 

O 

-  2 

0 

0.4 

2 

-  2 

-  2 

-  2 

*1 

3 

-  3.5 

0 

2 

0 

0 

-  0.25 

-  0.5 

-  1 

4 

0 

-  2 

0 

-  4 

-  1 

-  4 

-  4 

-  4 

5 

0 

-  9 

-  2 

1 

-  2.8 

-  4 

-  8 

-16 

6 

2 

0 

-  4 

0 

0 

-  1 

-  2 

-  4 

7 

-  1 

-  1 

-  1 

-  1 

-  [ 

-40 

-40 

-40 

s 

-  1 

-  2 

-  3 

_  2 

-  1 

-60 

-60 

-60 

1 

2 

3 

4 

5 

5 

2.5 

2.5 

10 

1 

1 

1 

1 

1 

1 

l 

i 

Problem  5.  Hexagon  Problem 
Maximize  f  (x)  the  area  of  a  hexagon  where, 

f(«)  -  HI*!!*-  »ar,  +  I,I9-  *si9  ♦  X5I|  -  V7I 


subject  to  constraints  that  the  maximum  diameter  is  unity 

1  2  if  •  *4 
i>«5 

1  2  «|  ♦  (*2  ’  ‘V*3 

1>(i,  «j)2*<«2  t61l 

1  2  tij  r7>2  ♦  <*2  -  «g'2 

i  :•  Uj  -  is)2 .  (i4  j6i2 
1  (ij  - 17)2  ♦  (i4  jg)J 
1  2  l?  ♦  <*8  *g!2 


22 


and  that  the  figure  described  la  a  nondegenerate  hexagon 

V*i° 

ijlg  i  0 

«5‘,  ~  1SIT  >  0 
t9  >  0 

The  figure  described  has  one  diameter  on  tfca  vertical  axis.  The  problem  is 
not  expressed  in  its  simplest  form. 
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EXAMPLES  OP  RESTARTS 


Example  of  Restart  after  a  Partial  Inverse  Move  for 
the  Projected  Gradient  Scheme 

Suppose  m  «  2  and  A  and  b  are 


A  partial  Inverse  of  the  leading  2x2  suhriatrix  of  A  has  the  form 

/i  ov 
K  -  ('l  2  0  1-  H0 
\o  0  o/ 


This  la  used  as  the  starting  matrix. 

Iteration  0,  the  initial  point 

‘0 

Ho 

>0  *0 

10 

31 

10 

31 

, 

10 

41 

Iteration  1,  after  the  partial  inverse  step 

li 

#1 

>0  *0 

to 

0 

31  0 

-21 

0 

-31  31 

10 

10 

-31  0 

24 


H  is  now  reset  to 


(mi'  /i  t\r/2  i  ov2  iV|  V2  1  °\ 
0  1  1  j  M  1  l J  VI  t  1/ 


t 

l  i/6 

-l/S 

1/6 

-  (1/3 

2/S 

-1/3 

\  1/6 

-1/3 

1/6 

Iteration  2,  alter  one  projected  gradient  stage 

l3  *t  >2  ‘2 

00  0-10 
-1  0  0  20 

0  0  -10  -10 


alter  updating  using  H2y2  above. 

To  check,  a  ~  At*  +  b  should  be  zero. 


This  completes  the  problem  after  two  steps  instead  ol  the  normal  three. 


* 


Example  ol  a  Partial  Inverse  Move  lor  the 
Fletcher-Powell-Davidon  Scheme 


Suppose  lor  n  =  2,  A  and  b  are  given  by 


A  partial  inverse  of  the  leading  2x2  submatrix  gives 


Start,  iteration  0 


‘o 

Bo 

YO 

10 

31 

10 

31 

10 

41 
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Iteration  1,  after  the  partial  inverse  step 

fii  Vo 

10  0  -31 

-21  0  -31 

10  10  -31  o 

H2  is  now  reset  to 

/i  t  0V  /l  0  O'  /2  IV  [72  i  oyz  ivi  l  c\ 

Hj  -  (-1  2  oj.lo  1  o)-fi  llj'l  1  l/f  1  )]  \1  1  1/ 

\  o  o  o/  \o  o  i/  \o  i/L  \a  i/. 

/  7/6  -1/3  1/6 V 
.1-4/3  S/6  -1/3  j 

V  1/6  -1/3  1/6/ 

Iteration  2 

*2  «2  s2 
0  0  0  -10 

-1  0  0  20 

0  0  -10  -10 

Hj  after  updating  using  H2y2  and  s2  is 

/  2  -3  l 
H3  .  -£  6  -2 

V  1  -2  1 


Check  on  the  inverse 

f'l  1  0V/2  -3  IV  /I  0  0\ 

AHj  .  1  1  1  1  jf-3  6  -2  ]  .  ( 0  1  0  j 

\0  1  V\l  -2  1/  '0  0  1/ 


This  completes  the  problem  after  two  steps  instead  of  the  normal  three. 
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•  I  AasTHACT 


Two  basic  approaches  to  the  generation  of  conjugate  directions  are  con¬ 
sidered  for  the  problem  of  unconstrained  minimization  of  a  quadratic  function. 
Using  the  principle  oi  choosing  a  step  direction  orthogonal  to  the  previous 
gradient  changes,  a  projected  gradient  algorithm  and  a  class  of  variable  metric 
algorithms  are  derived.  Three  variants  of  the  class  are  developed  into  algo¬ 
rithms,  one  of  which  is  the  Fletcher-Powell-Davidon  scheme. 

Numerical  results  indicate  the  merits  of  the  new  algorithms  compared 
to  several  now  in  use,  Cor  a  variety  of  nonquadratic  problems. 
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